Detecting incorrect product names in online sources for product master data

نویسندگان

  • Stephan Karpischek
  • Florian Michahelles
  • Elgar Fleisch
چکیده

The global trade item number (GTIN) is traditionally used to identify trade items and look up corresponding information within industrial supply chains. Recently, consumers have also started using GTINs to access additional product information with mobile barcode scanning applications. Providers of these applications use different sources to provide product names for scanned GTINs. In this paper we analyze data from eight publicly available sources for a set of GTINs scanned by users of a mobile barcode scanning application. Our aim is tomeasure the correctness of product names in online sources and to quantify the problem of product data quality. We use a combination of string matching and supervised learning to estimate the number of incorrect product names. Our results show that approximately 2 % of all product names are incorrect. The applied method is useful for brand owners to monitor the data quality for their products and enables efficient data integration for application providers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Detection of Structural Changes in Data Warehouses

Data Warehouses provide sophisticated tools for analyzing complex data online, in particular by aggregating data along dimensions spanned by master data. Changes to these master data are a frequent threat to the correctness of OLAP results, in particular for multi-period data analysis, trend calculations, etc. As dimension data might change in underlying data sources without notifying the data ...

متن کامل

Using BMEcat Catalogs as a Lever for Product Master Data on the Semantic Web

To date, the automatic exchange of product information between business partners in a value chain is typically done using Business-toBusiness (B2B) catalog standards such as EDIFACT, cXML, or BMEcat. At the same time, the Web of Data, in particular the GoodRelations vocabulary, offers the necessary means to publish highly-structured product data in a machine-readable format. The advantage of th...

متن کامل

Alignment of Product Master Data

Market research draws a coherent picture of the market based on extensive observations of sales acts from numerous data sources. As the data sources refer to the products sold all over the world in different formats and with different keys, the data needs to be aligned to a common product master. The subtle strategies for a large-scale product data alignment, as well as the key structure of com...

متن کامل

Product Name Classification for Product Instance Distinction

Product names with a temporal cue in a product review often refer to several product instances purchased at different times. Previous approaches to product entity recognition and temporal information analysis do not take into account such temporal cues and thus fail to distinguish different product instances. We propose to formulate the resolution of such product names as a classification probl...

متن کامل

Assessing Tools for Coordinating Quality of Master Data in Inter-organizational Product Information Sharing

Product information sharing, i.e., inter-organizational transfer of master data relating to products, is a problematic, error-prone, labor-intensive, and costly process in many companies. This paper presents findings of a focus group interview and case studies at three wholesale trading companies that share product information with hundreds of suppliers. We identify and assess coordination mech...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Electronic Markets

دوره 24  شماره 

صفحات  -

تاریخ انتشار 2014